Datasets drive vision progress, yet existing driving datasets are impoverished in terms of visual content and supported tasks to study multitask learning for autonomous driving. Researchers are usually constrained to study a small set of problems on one dataset, while real-world computer vision applications require performing tasks of various complexities. We construct BDD100K 1 , the largest driving video dataset with 100K videos and 10 tasks to evaluate the exciting progress of image recognition algorithms on autonomous driving. The dataset possesses geographic, environmental, and weather diversity, which is useful for training models that are less likely to be surprised by new conditions. Based on this diverse dataset, we build a benchmark for heterogeneous multitask learning and study how to solve the tasks together. Our experiments show that special training strategies are needed for existing models to perform such heterogeneous tasks. BDD100K opens the door for future studies in this important venue.
translated by 谷歌翻译
Attention mechanisms form a core component of several successful deep learning architectures, and are based on one key idea: ''The output depends only on a small (but unknown) segment of the input.'' In several practical applications like image captioning and language translation, this is mostly true. In trained models with an attention mechanism, the outputs of an intermediate module that encodes the segment of input responsible for the output is often used as a way to peek into the `reasoning` of the network. We make such a notion more precise for a variant of the classification problem that we term selective dependence classification (SDC) when used with attention model architectures. Under such a setting, we demonstrate various error modes where an attention model can be accurate but fail to be interpretable, and show that such models do occur as a result of training. We illustrate various situations that can accentuate and mitigate this behaviour. Finally, we use our objective definition of interpretability for SDC tasks to evaluate a few attention model learning algorithms designed to encourage sparsity and demonstrate that these algorithms help improve interpretability.
translated by 谷歌翻译
深度强化学习方法是最近在计算机视觉和机器人技术社区中进行视觉导航任务的流行方法。在大多数情况下,奖励函数具有二进制结构,即当代理达到目标状态时,将提供大量的积极奖励,并为环境中的每个其他状态分配负面的刑罚。这样的稀疏信号使学习过程具有挑战性,特别是在大环境中,需要采取大量顺序动作才能达到目标。我们引入了奖励成型机制,该机制逐渐根据目标距离逐渐调整奖励信号。使用AI2进行的详细实验 - 该模拟环境证明了对象目标导航任务所提出的方法的功效。
translated by 谷歌翻译
以人为中心的可解释人工智能(HCXAI)社区提出了将解释过程作为人与机器之间的对话进行构建。在该立场论文中,我们为基于文本的对话剂建立了Desiderata,能够使用自然语言进行交互方式解释神经模型的行为。从自然语言处理(NLP)研究的角度来看,我们设计了这种调解人的蓝图,以进行情感分析的任务,并评估当前的研究在基于对话的解释方面走上了多远。
translated by 谷歌翻译
神经网络的越来越大的规模及其越来越多的应用空间对更高的能量和记忆有效的人工智能特定硬件产生了需求。 venues为了缓解主要问题,von neumann瓶颈,包括内存和近记忆架构,以及算法方法。在这里,我们利用磁隧道结(MTJ)的低功耗和固有的二进制操作来展示基于MTJ的无源阵列的神经网络硬件推断。通常,由于设备到装置的变化,写入误差,寄生电阻和非前沿,在性能下将训练的网络模型转移到推动的硬件。为了量化这些硬件现实的效果,我们将300个唯一重量矩阵解决方案的23个唯一的重量矩阵解决方案进行分类,以分类葡萄酒数据集,用于分类准确性和写真保真度。尽管设备不完美,我们可以实现高达95.3%的软件等效精度,并在15 x 15 MTJ阵列中正确调整具有一系列设备尺寸的阵列。此调谐过程的成功表明,需要新的指标来表征混合信号硬件中再现的网络的性能和质量。
translated by 谷歌翻译
基于旋转扭矩振荡器的复合值Hopfield网络模拟可以恢复相位编码的图像。存储器增强逆变器的序列提供可调谐延迟元件,通过相位转换振荡器的振荡输出来实现复合权重的可调延迟元件。伪逆培训足以存储在一组192个振荡器中,至少代表16 $ \倍数为12个像素图像。恢复图像所需的能量取决于所需的错误级别。对于这里考虑的振荡器和电路,来自理想图像的5%均方方偏差需要大约5 00美元$ S并消耗大约130 NJ。模拟显示,当振荡器的谐振频率可以调整为具有小于10 ^ {-3} $的分数扩展时,网络功能良好,具体取决于反馈的强度。
translated by 谷歌翻译
我们研究马尔可夫决策过程(MDP),其中状态对应于随机生成奖励的因果图。在这个设置中,学习者的目标是通过在每个州的变量上介绍,识别导致高奖励的原子干预措施。概括最近的因果强盗框架,目前的工作开发(简单)后悔最小化对两级因果MDP的保证,每个状态下并行因果图。我们提出了一种算法,实现了一个依赖于困境的实例。我们算法的一个关键特征是它利用凸优化来解决探索问题。我们识别我们遗憾保证基本紧张的课程,实验验证我们的理论结果。
translated by 谷歌翻译